home *** CD-ROM | disk | FTP | other *** search
-
-
-
- FLEX User Commands FLEX
-
-
-
- NNAAMMEE
- flex - fast lexical analyzer generator
-
- SSYYNNOOPPSSIISS
- fflleexx [ --bbddffiippssttvvFFIILLTT --cc[[eeffmmFF]] --SSsskkeelleettoonn__ffiillee ] [ _f_i_l_e_n_a_m_e ]
-
- DDEESSCCRRIIPPTTIIOONN
- _f_l_e_x is a rewrite of _l_e_x intended to right some of that
- tool's deficiencies: in particular, _f_l_e_x generates lexical
- analyzers much faster, and the analyzers use smaller tables
- and run faster.
-
- OOPPTTIIOONNSS
- In addition to lex's --tt flag, flex has the following
- options:
-
- --bb Generate backtracking information to _l_e_x._b_a_c_k_t_r_a_c_k.
- This is a list of scanner states which require back-
- tracking and the input characters on which they do so.
- By adding rules one can remove backtracking states. If
- all backtracking states are eliminated and --ff or --FF is
- used, the generated scanner will run faster (see the --pp
- flag). Only users who wish to squeeze every last cycle
- out of their scanners need worry about this option.
-
- --dd makes the generated scanner run in _d_e_b_u_g mode. When-
- ever a pattern is recognized the scanner will write to
- _s_t_d_e_r_r a line of the form:
-
- --accepting rule #n
-
- Rules are numbered sequentially with the first one
- being 1. Rule #0 is executed when the scanner back-
- tracks; Rule #(n+1) (where _n is the number of rules)
- indicates the default action; Rule #(n+2) indicates
- that the input buffer is empty and needs to be refilled
- and then the scan restarted. Rules beyond (n+2) are
- end-of-file actions.
-
- --ff has the same effect as lex's -f flag (do not compress
- the scanner tables); the mnemonic changes from _f_a_s_t
- _c_o_m_p_i_l_a_t_i_o_n to (take your pick) _f_u_l_l _t_a_b_l_e or _f_a_s_t
- _s_c_a_n_n_e_r. The actual compilation takes _l_o_n_g_e_r, since
- flex is I/O bound writing out the big table.
-
- This option is equivalent to --ccff (see below).
-
- --ii instructs flex to generate a _c_a_s_e-_i_n_s_e_n_s_i_t_i_v_e scanner.
- The case of letters given in the flex input patterns
- will be ignored, and the rules will be matched regard-
- less of case. The matched text given in _y_y_t_e_x_t will
- have the preserved case (i.e., it will not be folded).
-
-
-
- Version 2.1 20 June 1989 1
-
-
-
-
-
-
- FLEX User Commands FLEX
-
-
-
- --pp generates a performance report to stderr. The report
- consists of comments regarding features of the flex
- input file which will cause a loss of performance in
- the resulting scanner. Note that the use of _R_E_J_E_C_T and
- variable trailing context (see BBUUGGSS)) entails a substan-
- tial performance penalty; use of _y_y_m_o_r_e(), the ^^ opera-
- tor, and the --II flag entail minor performance penal-
- ties.
-
- --ss causes the _d_e_f_a_u_l_t _r_u_l_e (that unmatched scanner input
- is echoed to _s_t_d_o_u_t) to be suppressed. If the scanner
- encounters input that does not match any of its rules,
- it aborts with an error. This option is useful for
- finding holes in a scanner's rule set.
-
- --vv has the same meaning as for lex (print to _s_t_d_e_r_r a sum-
- mary of statistics of the generated scanner). Many
- more statistics are printed, though, and the summary
- spans several lines. Most of the statistics are mean-
- ingless to the casual flex user, but the first line
- identifies the version of flex, which is useful for
- figuring out where you stand with respect to patches
- and new releases.
-
- --FF specifies that the _f_a_s_t scanner table representation
- should be used. This representation is about as fast
- as the full table representation (-_f), and for some
- sets of patterns will be considerably smaller (and for
- others, larger). In general, if the pattern set con-
- tains both "keywords" and a catch-all, "identifier"
- rule, such as in the set:
-
- "case" return ( TOK_CASE );
- "switch" return ( TOK_SWITCH );
- ...
- "default" return ( TOK_DEFAULT );
- [a-z]+ return ( TOK_ID );
-
- then you're better off using the full table representa-
- tion. If only the "identifier" rule is present and you
- then use a hash table or some such to detect the key-
- words, you're better off using -_F.
-
- This option is equivalent to --ccFF (see below).
-
- --II instructs flex to generate an _i_n_t_e_r_a_c_t_i_v_e scanner.
- Normally, scanners generated by flex always look ahead
- one character before deciding that a rule has been
- matched. At the cost of some scanning overhead, flex
- will generate a scanner which only looks ahead when
- needed. Such scanners are called _i_n_t_e_r_a_c_t_i_v_e because
- if you want to write a scanner for an interactive
-
-
-
- Version 2.1 20 June 1989 2
-
-
-
-
-
-
- FLEX User Commands FLEX
-
-
-
- system such as a command shell, you will probably want
- the user's input to be terminated with a newline, and
- without --II the user will have to type a character in
- addition to the newline in order to have the newline
- recognized. This leads to dreadful interactive perfor-
- mance.
-
- If all this seems to confusing, here's the general
- rule: if a human will be typing in input to your
- scanner, use --II,, otherwise don't; if you don't care
- about how fast your scanners run and don't want to make
- any assumptions about the input to your scanner, always
- use --II..
-
- Note, --II cannot be used in conjunction with _f_u_l_l or
- _f_a_s_t _t_a_b_l_e_s, i.e., the --ff,, --FF,, --ccff,, or --ccFF flags.
-
- --LL instructs flex to not generate ##lliinnee directives (see
- below).
-
- --TT makes flex run in _t_r_a_c_e mode. It will generate a lot
- of messages to stdout concerning the form of the input
- and the resultant non-deterministic and deterministic
- finite automatons. This option is mostly for use in
- maintaining flex.
-
- --cc[[eeffmmFF]]
- controls the degree of table compression. --ccee directs
- flex to construct _e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s_e_s, i.e., sets of
- characters which have identical lexical properties (for
- example, if the only appearance of digits in the flex
- input is in the character class "[0-9]" then the digits
- '0', '1', ..., '9' will all be put in the same
- equivalence class). --ccff specifies that the _f_u_l_l
- scanner tables should be generated - flex should not
- compress the tables by taking advantages of similar
- transition functions for different states. --ccFF speci-
- fies that the alternate fast scanner representation
- (described above under the --FF flag) should be used. --
- ccmm directs flex to construct _m_e_t_a-_e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s_e_s,
- which are sets of equivalence classes (or characters,
- if equivalence classes are not being used) that are
- commonly used together. A lone --cc specifies that the
- scanner tables should be compressed but neither
- equivalence classes nor meta-equivalence classes should
- be used.
-
- The options --ccff or --ccFF and --ccmm do not make sense
- together - there is no opportunity for meta-equivalence
- classes if the table is not being compressed. Other-
- wise the options may be freely mixed.
-
-
-
-
- Version 2.1 20 June 1989 3
-
-
-
-
-
-
- FLEX User Commands FLEX
-
-
-
- The default setting is --cceemm which specifies that flex
- should generate equivalence classes and meta-
- equivalence classes. This setting provides the highest
- degree of table compression. You can trade off
- faster-executing scanners at the cost of larger tables
- with the following generally being true:
-
- slowest smallest
- -cem
- -ce
- -cm
- -c
- -c{f,F}e
- -c{f,F}
- fastest largest
-
- Note that scanners with the smallest tables compile the
- quickest, so during development you will usually want
- to use the default, maximal compression.
-
- --SSsskkeelleettoonn__ffiillee
- overrides the default skeleton file from which flex
- constructs its scanners. You'll never need this option
- unless you are doing flex maintenance or development.
-
- IINNCCOOMMPPAATTIIBBIILLIITTIIEESS WWIITTHH LLEEXX
- _f_l_e_x is fully compatible with _l_e_x with the following excep-
- tions:
-
- - There is no run-time library to link with. You needn't
- specify -_l_l when linking, and you must supply a main
- program. (Hacker's note: since the lex library con-
- tains a main() which simply calls yylex(), you actually
- _c_a_n be lazy and not supply your own main program and
- link with -_l_l.)
-
- - lex's %%rr (Ratfor scanners) and %%tt (translation table)
- options are not supported.
-
- - The do-nothing -_n flag is not supported.
-
- - When definitions are expanded, flex encloses them in
- parentheses. With lex, the following
-
- NAME [A-Z][A-Z0-9]*
- %%
- foo{NAME}? printf( "Found it\n" );
- %%
-
- will not match the string "foo" because when the macro
- is expanded the rule is equivalent to "foo[A-Z][A-Z0-
- 9]*?" and the precedence is such that the '?' is
-
-
-
- Version 2.1 20 June 1989 4
-
-
-
-
-
-
- FLEX User Commands FLEX
-
-
-
- associated with "[A-Z0-9]*". With flex, the rule will
- be expanded to "foo([A-z][A-Z0-9]*)?" and so the string
- "foo" will match. Note that because of this, the ^^,, $$,,
- <<ss>>,, and // operators cannot be used in a definition.
-
- - The undocumented lex-scanner internal variable yyyylliinneennoo
- is not supported.
-
- - The iinnppuutt(()) routine is not redefinable, though may be
- called to read characters following whatever has been
- matched by a rule. If iinnppuutt(()) encounters an end-of-
- file the normal yyyywwrraapp(()) processing is done. A
- ``real'' end-of-file is returned as _E_O_F.
-
- Input can be controlled by redefining the YYYY__IINNPPUUTT
- macro. YY_INPUT's calling sequence is
- "YY_INPUT(buf,result,max_size)". Its action is to
- place up to max_size characters in the character buffer
- "buf" and return in the integer variable "result"
- either the number of characters read or the constant
- YY_NULL (0 on Unix systems) systems) to indicate EOF.
- The default YY_INPUT reads from the file-pointer "yyin"
- (which is by default _s_t_d_i_n), so if you just want to
- change the input file, you needn't redefine YY_INPUT -
- just point yyin at the input file.
-
- A sample redefinition of YY_INPUT (in the first section
- of the input file):
-
- %{
- #undef YY_INPUT
- #define YY_INPUT(buf,result,max_size) \
- result = (buf[0] = getchar()) == EOF ? YY_NULL : 1;
- %}
-
- You also can add in things like counting keeping track
- of the input line number this way; but don't expect
- your scanner to go very fast.
-
- - oouuttppuutt(()) is not supported. Output from the ECHO macro
- is done to the file-pointer "yyout" (default _s_t_d_o_u_t).
-
- - If you are providing your own yywrap() routine, you
- must "#undef yywrap" first.
-
- - To refer to yytext outside of your scanner source file,
- use "extern char *yytext;" rather than "extern char
- yytext[];".
-
- - yyyylleenngg is a macro and not a variable, and hence cannot
- be accessed outside of the scanner source file.
-
-
-
-
- Version 2.1 20 June 1989 5
-
-
-
-
-
-
- FLEX User Commands FLEX
-
-
-
- - flex reads only one input file, while lex's input is
- made up of the concatenation of its input files.
-
- - The name FLEX_SCANNER is #define'd so scanners may be
- written for use with either flex or lex.
-
- - The macro YY_USER_ACTION can be redefined to provide an
- action which is always executed prior to the matched
- rule's action. For example, it could be #define'd to
- call a routine to convert yytext to lower-case, or to
- copy yyleng to a global variable to make it accessible
- outside of the scanner source file.
-
- - In the generated scanner, rules are separated using
- YY_BREAK instead of simple "break"'s. This allows, for
- example, C++ users to #define YY_BREAK to do nothing
- (while being very careful that every rule ends with a
- "break" or a "return"!) to avoid suffering from
- unreachable statement warnings where a rule's action
- ends with "return".
-
- EENNHHAANNCCEEMMEENNTTSS
- - _E_x_c_l_u_s_i_v_e _s_t_a_r_t-_c_o_n_d_i_t_i_o_n_s can be declared by using %%xx
- instead of %%ss.. These start-conditions have the property
- that when they are active, _n_o _o_t_h_e_r _r_u_l_e_s _a_r_e _a_c_t_i_v_e.
- Thus a set of rules governed by the same exclusive
- start condition describe a scanner which is independent
- of any of the other rules in the flex input. This
- feature makes it easy to specify "mini-scanners" which
- scan portions of the input that are syntactically dif-
- ferent from the rest (e.g., comments).
-
- - _y_y_t_e_r_m_i_n_a_t_e() can be used in lieu of a return statement
- in an action. It terminates the scanner and returns a
- 0 to the scanner's caller, indicating "all done".
-
- - _E_n_d-_o_f-_f_i_l_e _r_u_l_e_s. The special rule "<<EOF>>" indicates
- actions which are to be taken when an end-of-file is
- encountered and yywrap() returns non-zero (i.e., indi-
- cates no further files to process). The action can
- either point yyin at a new file to process, in which
- case the action should finish with _Y_Y__N_E_W__F_I_L_E (this is
- a branch, so subsequent code in the action won't be
- executed), or it should finish with a _r_e_t_u_r_n statement.
- <<EOF>> rules may not be used with other patterns; they
- may only be qualified with a list of start conditions.
- If an unqualified <<EOF>> rule is given, it applies
- only to the INITIAL start condition, and _n_o_t to %%ss
- start conditions. These rules are useful for catching
- things like unclosed comments. An example:
-
- %x quote
-
-
-
- Version 2.1 20 June 1989 6
-
-
-
-
-
-
- FLEX User Commands FLEX
-
-
-
- %%
- ...
- <quote><<EOF>> {
- error( "unterminated quote" );
- yyterminate();
- }
- <<EOF>> {
- yyin = fopen( next_file, "r" );
- YY_NEW_FILE;
- }
-
-
- - flex dynamically resizes its internal tables, so direc-
- tives like "%a 3000" are not needed when specifying
- large scanners.
-
- - The scanning routine generated by flex is declared
- using the macro YYYY__DDEECCLL.. By redefining this macro you
- can change the routine's name and its calling sequence.
- For example, you could use:
-
- #undef YY_DECL
- #define YY_DECL float lexscan( a, b ) float a, b;
-
- to give it the name _l_e_x_s_c_a_n, returning a float, and
- taking two floats as arguments. Note that if you give
- arguments to the scanning routine, you must terminate
- the definition with a semi-colon (;).
-
- - flex generates ##lliinnee directives mapping lines in the
- output to their origin in the input file.
-
- - You can put multiple actions on the same line,
- separated with semi-colons. With lex, the following
-
- foo handle_foo(); return 1;
-
- is truncated to
-
- foo handle_foo();
-
- flex does not truncate the action. Actions that are
- not enclosed in braces are terminated at the end of the
- line.
-
- - Actions can be begun with %%{{ and terminated with %%}}.. In
- this case, flex does not count braces to figure out
- where the action ends - actions are terminated by the
- closing %%}}.. This feature is useful when the enclosed
- action has extraneous braces in it (usually in comments
- or inside inactive #ifdef's) that throw off the brace-
- count.
-
-
-
- Version 2.1 20 June 1989 7
-
-
-
-
-
-
- FLEX User Commands FLEX
-
-
-
- - All of the scanner actions (e.g., EECCHHOO,, yyyywwrraapp ......))
- except the uunnppuutt(()) and iinnppuutt(()) routines, are written as
- macros, so they can be redefined if necessary without
- requiring a separate library to link to.
-
- - When yyyywwrraapp(()) indicates that the scanner is done pro-
- cessing (it does this by returning non-zero), on subse-
- quent calls the scanner will always immediately return
- a value of 0. To restart it on a new input file, the
- action yyyyrreessttaarrtt(()) is used. It takes one argument, the
- new input file. It closes the previous yyin (unless
- stdin) and sets up the scanners internal variables so
- that the next call to yylex() will start scanning the
- new file. This functionality is useful for, e.g., pro-
- grams which will process a file, do some work, and then
- get a message to parse another file.
-
- - Flex scans the code in section 1 (inside %{}'s) and the
- actions for occurrences of _R_E_J_E_C_T and _y_y_m_o_r_e(). If it
- doesn't see any, it assumes the features are not used
- and generates higher-performance scanners. Flex tries
- to be correct in identifying uses but can be fooled
- (for example, if a reference is made in a macro from a
- #include file). If this happens (a feature is used and
- flex didn't realize it) you will get a compile-time
- error of the form
-
- reject_used_but_not_detected undefined
-
- You can tell flex that a feature is used even if it
- doesn't think so with %%uusseedd followed by the name of the
- feature (for example, "%used REJECT"); similarly, you
- can specify that a feature is _n_o_t used even though it
- thinks it is with %%uunnuusseedd..
-
- - Comments may be put in the first section of the input
- by preceding them with '#'.
-
- FFIILLEESS
- _f_l_e_x._s_k_e_l
- skeleton scanner
-
- _l_e_x._y_y._c
- generated scanner (called _l_e_x_y_y._c on some systems).
-
- _l_e_x._b_a_c_k_t_r_a_c_k
- backtracking information for --bb flag (called _l_e_x._b_c_k on
- some systems).
-
- SSEEEE AALLSSOO
- lex(1)
-
-
-
-
- Version 2.1 20 June 1989 8
-
-
-
-
-
-
- FLEX User Commands FLEX
-
-
-
- M. E. Lesk and E. Schmidt, _L_E_X - _L_e_x_i_c_a_l _A_n_a_l_y_z_e_r _G_e_n_e_r_a_t_o_r
-
- AAUUTTHHOORR
- Vern Paxson, with the help of many ideas and much inspira-
- tion from Van Jacobson. Original version by Jef Poskanzer.
- Fast table representation is a partial implementation of a
- design done by Van Jacobson. The implementation was done by
- Kevin Gong and Vern Paxson.
-
- Thanks to the many flex beta-testers and feedbackers, espe-
- cially Casey Leedom, Frederic Brehm, Nick Christopher, Chris
- Faylor, Eric Goldman, Eric Hughes, Greg Lee, Craig Leres,
- Mohamed el Lozy, Jim Meyering, Esmond Pitt, Jef Poskanzer,
- and Dave Tallman. Thanks to Keith Bostic, John Gilmore, Bob
- Mulcahy, Rich Salz, and Richard Stallman for help with vari-
- ous distribution headaches.
-
- Send comments to:
-
- Vern Paxson
- Real Time Systems
- Bldg. 46A
- Lawrence Berkeley Laboratory
- 1 Cyclotron Rd.
- Berkeley, CA 94720
-
- (415) 486-6411
-
- vern@csam.lbl.gov
- vern@rtsg.ee.lbl.gov
- ucbvax!csam.lbl.gov!vern
-
- I will be gone from mid-July '89 through mid-August '89.
- From August on, the addresses are:
-
- vern@cs.cornell.edu
-
- Vern Paxson
- CS Department
- Grad Office
- 4126 Upson
- Cornell University
- Ithaca, NY 14853-7501
-
- <no phone number yet>
-
- Email sent to the former addresses should continue to be
- forwarded for quite a while. Also, it looks like my user-
- name will be "paxson" and not "vern". I'm planning on hav-
- ing a mail alias set up so "vern" will still work, but if
- you encounter problems try "paxson".
-
-
-
-
- Version 2.1 20 June 1989 9
-
-
-
-
-
-
- FLEX User Commands FLEX
-
-
-
- DDIIAAGGNNOOSSTTIICCSS
- _f_l_e_x _s_c_a_n_n_e_r _j_a_m_m_e_d - a scanner compiled with --ss has encoun-
- tered an input string which wasn't matched by any of its
- rules.
-
- _f_l_e_x _i_n_p_u_t _b_u_f_f_e_r _o_v_e_r_f_l_o_w_e_d - a scanner rule matched a
- string long enough to overflow the scanner's internal input
- buffer (16K bytes - controlled by YYYY__BBUUFF__MMAAXX in
- "flex.skel").
-
- _o_l_d-_s_t_y_l_e _l_e_x _c_o_m_m_a_n_d _i_g_n_o_r_e_d - the flex input contains a
- lex command (e.g., "%n 1000") which is being ignored.
-
- BBUUGGSS
- Some trailing context patterns cannot be properly matched
- and generate warning messages ("Dangerous trailing con-
- text"). These are patterns where the ending of the first
- part of the rule matches the beginning of the second part,
- such as "zx*/xy*", where the 'x*' matches the 'x' at the
- beginning of the trailing context. (Lex doesn't get these
- patterns right either.) If desperate, you can use yyyylleessss(())
- to effect arbitrary trailing context.
-
- _v_a_r_i_a_b_l_e trailing context (where both the leading and trail-
- ing parts do not have a fixed length) entails the same per-
- formance loss as _R_E_J_E_C_T (i.e., substantial).
-
- For some trailing context rules, parts which are actually
- fixed-length are not recognized as such, leading to the
- abovementioned performance loss. In particular, parts using
- '|' or {n} are always considered variable-length.
-
- Use of unput() or input() trashes the current yytext and
- yyleng.
-
- Use of unput() to push back more text than was matched can
- result in the pushed-back text matching a beginning-of-line
- ('^') rule even though it didn't come at the beginning of
- the line.
-
- yytext and yyleng cannot be modified within a flex action.
-
- Nulls are not allowed in flex inputs or in the inputs to
- scanners generated by flex. Their presence generates fatal
- errors.
-
- Flex does not generate correct #line directives for code
- internal to the scanner; thus, bugs in _f_l_e_x._s_k_e_l yield bogus
- line numbers.
-
- Pushing back definitions enclosed in ()'s can result in
- nasty, difficult-to-understand problems like:
-
-
-
- Version 2.1 20 June 1989 10
-
-
-
-
-
-
- FLEX User Commands FLEX
-
-
-
- {DIG} [0-9] /* a digit */
-
- In which the pushed-back text is "([0-9] /* a digit */)".
-
- Due to both buffering of input and read-ahead, you cannot
- intermix calls to stdio routines, such as, for example,
- ggeettcchhaarr(()) with flex rules and expect it to work. Call
- iinnppuutt(()) instead.
-
- The total table entries listed by the --vv flag excludes the
- number of table entries needed to determine what rule has
- been matched. The number of entries is equal to the number
- of DFA states if the scanner does not use REJECT, and some-
- what greater than the number of states if it does.
-
- To be consistent with ANSI C, the escape sequence \xhh
- should be recognized for hexadecimal escape sequences, such
- as '\x41' for 'A'.
-
- It would be useful if flex wrote to lex.yy.c a summary of
- the flags used in its generation (such as which table
- compression options).
-
- The scanner run-time speeds still have not been optimized as
- much as they deserve. Van Jacobson's work shows that the
- can go faster still.
-
- The utility needs more complete documentation.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Version 2.1 20 June 1989 11
-
-
-
-